Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add_midas_transformer #1668

Closed

Conversation

Beerstabr
Copy link
Contributor

Fixes #1570

Summary

Added a transformer that converts a (high frequency) time series into a lower frequency time series using the mixed-data sampling (MIDAS) approach. For example, covert a monthly time series into a quarterly time series. The benefit of doing this is that you can now use e.g. a monthly time series as an input for a model that forecasts a quarterly targets series without losing (as much) information as you would when you downsample a time series from monthly to quarterly using e.g. pandas.DataFrame.resample.

Limitations:

  • The frequency of the original time series must always be exactly the same multiple of the target frequency for MIDAS to be possible. For example, there are always four quarters in a year, but the number of days in a month is not always the same.
  • MIDAS only works when you go from a high frequency to a lower frequency.

Possible improvements:

  • Aggregating columns to avoid an intermittent time series or to limit the increase in components. For example, when you go from seconds to minutes you'd get 60 columns holding data for each of the seconds in a minute. You could also aggregate this to four columns each representing 15 seconds.

@madtoinou
Copy link
Collaborator

@Beerstabr Some tests seem to fail because of None values, could you add the necessary logic to the MIDAS transformation so that these cases are properly handled?

@Beerstabr
Copy link
Contributor Author

Beerstabr commented Mar 28, 2023 via email

@Beerstabr
Copy link
Contributor Author

@madtoinou it should be fixed now. I wasn't dealing with the 'args' in the correct way, it had changed without me rerunning the tests.

@codecov-commenter
Copy link

codecov-commenter commented Mar 31, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.10 ⚠️

Comparison is base (80c0e5f) 94.29% compared to head (383ceb5) 94.19%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1668      +/-   ##
==========================================
- Coverage   94.29%   94.19%   -0.10%     
==========================================
  Files         125      126       +1     
  Lines       11636    11683      +47     
==========================================
+ Hits        10972    11005      +33     
- Misses        664      678      +14     
Impacted Files Coverage Δ
darts/dataprocessing/transformers/__init__.py 100.00% <100.00%> (ø)
darts/dataprocessing/transformers/midas.py 100.00% <100.00%> (ø)

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@dennisbader
Copy link
Collaborator

Hi @Beerstabr, and thanks for this PR! 🚀 Before I review I was thinking that we could make this an invertible transformer.

I would be intersted to see what happens when we have a higher frequency target and lower frequency covariates.
We'd use MIDAS to transform the target into a multivariate series of this lower frequency. Now we forecast that multivariate series with the covariates and in the end inverse transform to the higher frequency.

What do you think? :)

@Beerstabr
Copy link
Contributor Author

@dennisbader sounds good! I'll work on it next week!

@dennisbader
Copy link
Collaborator

Hi @Beerstabr , just wanted to check in if there were any updates on this one? :)

@Beerstabr
Copy link
Contributor Author

Hi @dennisbader, I've been drowned in work lately, and I've not gotten to it. Really would like to finish it though. No updates for now. I have heard this kind of a technique has been implemented in a large retail firm with success (giving a significant increase in accuracy) so it should be worthwhile.

@dennisbader
Copy link
Collaborator

Hi @Beerstabr, thanks for the update. No worries, I totally understand. We can take it over from here.
Thanks again for all the work so far on this very neat feature! 🚀

@madtoinou
Copy link
Collaborator

Hi @Beerstabr,

Thank you a lot of the work, I am going to take over this PR and try to finish it in time for the next release 🚀.

@madtoinou madtoinou requested a review from hrzn as a code owner June 7, 2023 12:38
@Beerstabr
Copy link
Contributor Author

@madtoinou thanks! Let me know if something is unclear and I can help to explain it.

@madtoinou
Copy link
Collaborator

Thank you, the code was clear enough.

In order to reverse the transformation, the number of component generated during the transform is extracted and used to divide the low frequency period to retrieve the high frequency period. This approach does not work for period above months because of the varying number of days so I added some rule-based logic (pandas Timedelta does not support units 'M', 'Y', and 'y' as they "do not represent unambiguous timedelta values durations", making me thing that there is no other solution).

Still requires a bit work to properly support multivariate TimeSeries but should not be too difficult.

@madtoinou madtoinou mentioned this pull request Jun 8, 2023
@dennisbader
Copy link
Collaborator

Closing this as #1820 was merged.

Thanks again for this @Beerstabr! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Released
Development

Successfully merging this pull request may close these issues.

Add transformer that converts higher frequency time series to lower frequency
4 participants